Unraveling the hierarchical genetic structure of tea green leafhopper, Matsumurasca onukii, in East Asia based on SSRs and SNPs

Abstract Matsumurasca onukii (Matsuda, R. (1952). Oyo‐Kontyu Tokyo, 8(1): 19–21), one of the dominant pests in major tea production areas in Asia, currently is known to occur in Japan, Vietnam, and China, and severely threatens tea production, quality, and international trade. To elucidate the population genetic structure of this species, 1633 single nucleotide polymorphisms (SNPs) and 18 microsatellite markers (SSRs) were used to genotype samples from 27 sites representing 18 geographical populations distributed throughout the known range of the species in East Asia. Analyses of both SNPs and SSRs showed that M. onukii populations in Yunnan exhibit high‐genetic differentiation and structure compared with the other populations. The Kagoshima (JJ) and Shizuoka (JS) populations from Japan were separated from populations from China by SNPs, but clustered with populations from Jinhua (JH), Yingde (YD), Guilin (GL), Fuzhou (FZ), Hainan (HQ), Leshan (CT), Chongqing (CY), and Zunyi (ZY) tea plantations in China and the Vietnamese Vinh Phuc (VN) population based on the SSR data. In contrast, CT, CY, ZY, and Shaanxi (SX) populations clustered together based on SNPs, but were separated by SSRs. Both marker datasets identified significant geographic differentiation among the 18 populations. Various environmental and anthropogenic factors, including geographical barriers to migration, human transport of hosts (Camellia sinesis [L.] O. Kuntze) and adaptation of M. onukii to various local climatic zones possibly account for the rapid spread of this pest in Asia. The results demonstrate that SNPs from high‐throughput genotyping data can be used to reveal subtle genetic substructure at broad scales in r‐strategist insects.


| INTRODUC TI ON
Recent methodological advances in population genetic and data analysis have greatly facilitated our understanding of genetic diversity, population structure, and microevolution of insect pests, which can guide the prediction of evolutionary potential (such as environmental adaptability and population fecundity) and facilitate ecological regulation and management (such as biocontrol technology and pest behavior regulation) (Roderick, 1996;Roderick & Navajas, 2003). However, detecting subtle genetic differentiation remains a major challenge because many species, particularly r-strategists, have large effective population sizes and high gene flow due to long-distance dispersal or humanmediated movement on crop plants (Matsumoto et al., 2013;Mun et al., 1999;Shi et al., 2012). In addition to exhibiting subtle levels of genetic differentiation, such populations may display diverse patterns of genetic structure, including classical patterns of isolation by vicariance, host, regional clustering, origin, and dispersion (Du et al., 2020;Goodman et al., 2019), as well as complex patterns that lack clear spatial structuring or that may be influenced by anthropogenic factors and habitat (Jiang et al., 2020;Kareem et al., 2018;Raszick et al., 2019). Identifying subtle complex genetic patterns remains a critical objective for developing effective control measures because some pest management strategies require detailed knowledge of genetic diversity and genetic structure determination that can underpin downstream efforts to measure contemporary and historical migration, delineate groups within populations, and infer evolutionary history (Jiang et al., 2020;Raszick et al., 2019).
Previous comparative morphological studies revealed obvious intraspecific variation among populations, with individuals from different populations in southern China differing in the shape of membranous flanges and lengths of spiny protuberances on the shaft of the male aedeagus (Qin et al., 2015;Xu et al., 2021). This pest currently is widely distributed and causes considerable yield loss in diverse cultivated varieties of Camellia sinesis (L.) O. Kuntze in Japan, Vietnam, and China that are grown under a variety of ecological conditions (Dworakowska, 1971(Dworakowska, , 1982Qin et al., 2015). Based on these prior morphological observations, the broad distribution of the species and the differences in environmental conditions among tea-growing regions, we hypothesized that these morphological variations reflect genetic differences among geographic populations.
Microsatellite-based results revealed some geographical differentiation between populations, particularly between two populations in Yunnan and the other Chinese populations but provided only limited evidence of consistent population genetic structure, potentially affected by endogenous properties of the markers used (e.g., fluctuations in allele frequency and recombination rates), the quality of genetic data and the sampling methods. For the present study, we obtained more extensive data using next-generation sequencing (NGS) to more fully characterize the genetic substructure in M.
onukii population in East Asia.
The NGS methods based on reduced-representation genome sequencing (RRGS) have become widely used in population genetic analyses, particularly genotyping-by-sequencing (GBS) and restriction-associated DNA sequencing (RAD-seq) (Andrews et al., 2016;Davey et al., 2011;Elshire et al., 2011). These methods allow rapid detection of novel nuclear markers at the minimal cost.
Genome-wide SNPs can provide more power for understanding population dynamics and pest species than mitochondrial DNA or microsatellite (SSR) markers (Emerson et al., 2010;Liu et al., 2019).
While SSR (e.g., high polymorphism) can provide estimates of allelic distribution across whole populations, SNPs can compensate for the drawbacks of SSRs (e.g., homoplasy, complex mutation pattern, and high prevalence of null allele) (Coates et al., 2009;Kraus et al., 2015;Morin et al., 2004). Analysis of hundreds to thousands of SNPs per population has also allowed for the use of more sophisticated tools to address questions of genetic differentiation, complex patterns, and connectivity between populations (Dowle et al., 2017;Havill et al., 2017Havill et al., , 2019Ma et al., 2020).
In the present study, microsatellite markers were coupled with high-throughput sequencing approaches to obtain three goals: (i) to assess the spatial distribution of genetic variation of M. onukii populations in East Asia based on the microsatellite markers, followed by more detailed analysis of genetic differentiation using two types of molecular markers; (ii) to compare population relationships and levels of geographic differentiation among populations from the performances of SNPs and microsatellite markers and determine if patterns reflect morphological variation among populations; and (iii) to estimated historical migration between genetic clusters for inferring the evolutionary history of M. onukii populations in East Asia.
This study will reveal correlated patterns of morphological variation and hierarchical genetic structure in East Asian M. onukii population, which will be useful in monitoring species distribution and future population dynamics. This will be enable better prediction, regulation, and management of M. onukii populations in different tea producing regions of East Asia.

| Sampling and species identification
Tea green leafhopper specimens of 27 representative populations, including 24 Chinese localities, two populations from Japan (Kagoshima and Shizuoka), and one population from Vietnam (Vinh Phuc), were collected during the seasons of tea green leafhopper occurrence from 2013 to 2020 (Table S1) Figure 1).
All the collection sites were areas of more than 6 ha under continuous tea cultivation without weeds or trees. Adult specimens were captured by sweep net in five collecting points (10 m 2 ) near the center of each site, spaced 100 m or more apart. Specimens from all collecting sites within a location were pooled to represent one population. All these specimens were preserved in 100% ethanol and

| DNA extraction
Thirty male adults in each population, except for the dissected male genitalia used for species identification, were used for extracting DNA using either the CTAB method or DNeasy blood and tissue kits (Qiagen). The quality of DNA was detected and quantified with a Nanodrop One (Thermo Scientific).
All forward primers were labeled with fluorescent markers (FAM, HEX, and TAMRA). DNA from each individual was amplified for these markers in a 10 μl volume. PCR reactions followed the protocol described by Zhang et al. (2016). All the products were pooled and genotyped by automated capillary electrophoresis on an ABI 3130xl at Sangon Biotech. Microsatellite alleles were scored using the Microsatellite Plugin v.1.4 in Geneious v. 7.1.7 (Biomatters Ltd).
The fragment lengths of different loci in different populations were calculated according to the molecular internal reference (Liz 500).
All the genotypic data conversions were performed with GenALEx v.
F I G U R E 1 Geographic distribution and populations of Matsumurasca onukii. Population codes are listed in Table S1. SimpleMappr was used to produce a distribution map based on the geographical coordinates in Table S1. http://www.simpl emappr.net/#tabs=0. Structure v 2.3.4 was used to estimate the number of genetically distinct clusters (K) among the M. onukii populations using an admixture ancestry model with a burn-in period of 5 × 10 4 iterations and 10 6 Markov Chain Monte Carlo (MCMC) repetitions. The range of possible clusters was set from 1 to 27 or to 18, with 20 independent runs for each K (Pritchard et al., 2000;Pritchard et al., 2010). The optimal K-values in M. onukii populations was estimated by posterior probability of the data (L (K)) and the ad hoc statistic (ΔK) using the Web server Clumpak was used to align results from replicate analyses and visualize population structure (Evanno et al., 2005;Kopelman et al., 2015;Pritchard et al., 2000;Pritchard et al., 2010).
Principal component analysis (PCoA) was performed by GenALEx v.  Buffer. The reaction system was incubated for 1 h and inactivated for 30 min at 65°C. The Barcode DNA provided special identification tags for each sample, and the included universal adaptor that also provided binding sites for sequencing primers. All the DNA fragments (300-600 bp) were pooled in a sequencing library after purification and recovery. Following amplification with sequencing primers, purification, and recovery, the quality of the library was assessed using an Aglient 2100 Bioanalyzer, KAPA Library Quantification Kit and Qubit 2.0. The libraries were then sequenced on an lllumina HiSeq 2500 platform using a 150 bp Paired End protocol at Allwegene Technologies.

| SNP genotyping
Raw reads were obtained by Base calling, assessed for quality, and filtered by length of fragment, Q-value, and sequence alignment in databases. The identification of SNPs in Raw reads was accomplished using the Universal Network Enabled Analysis Kit (UNEAK) in TASSEL 3.0 (Bradbury et al., 2007;Lu et al., 2012Lu et al., , 2013. First, these reads were assigned to different individuals and populations. Following UNEAK, tags of different populations, consisting of the same reads, were used to construct network topologies. Missequenced reads, repetitive sequences, and parahomologous genes were rejected in order to obtain SNPs with high accuracy. Sequencing depth and coverage, SNP calling and genotyping were processed using BWA v. 0.7.15 and SAMtools, respectively ). In population-genetic filters, the minor allele frequency was less than 0.10. Finally, SNPs with minimum allele frequency <0.01 (Q20 > 95%, Q30 > 90%), coverage more than four and sequencing error rate not <0.05, were retained for subsequent population genetic analysis. Conversion from Fastq files to the desired formats was carried out using

| Isolation by distance analysis
Isolation by distance (IBD) analyses were conducted to test

| Historical gene flow estimates
Given high polymorphism and mutation rates, microsatellite markers were used to estimate population size and migration parameters among clusters obtained from the results of population structure analysis. The immigration rate (M) per generation and the mutationscaled population size (θ) were analyzed using the Bayesian inference with full model in MIGRATE v. 3.7.2 (Beerli, 2006). The effective population size (Ne) of each population was calculated with θ = 4Neμ (μ is mutation rate, 10 −4 per locus per generation for microsatellites) (Whittaker et al., 2003). Effective number of migrants per generation (Nem) was calculated as Nem = θM/4. The first run was estimated from F ST , and the other three runs were started with θ and M from the previous run. During the Bayesian search, one long chain with four independent replicates was used for each run with 10,000,000 generations after an initial burn in of 100,000 interactions. Heated chain was set to: 1.0, 1.5, 3.0, and 1000,000.

| Morphological variation
The morphological comparisons among specimens from differ-

| Genetic diversity and genetic structure revealing by SSRs
The HWE test showed that 18 microsatellite markers were suitable for further genetic analysis, with low-null allele frequencies, ranging  (CX, JD, MH, PE, LX, and MJ) were also clustered ( Figure S1a,b) in the principal coordinate analysis. The amova revealed that most of the variance was due to variation within populations (93.51%, FST = 0.065; p < .0001); although a significant fraction of the total variation was also due to variation between clusters (2.78%, FCT = 0.028; p < .0001) and between populations (3.71%, FSC = 0.038; p < .0001) ( Table 1) (Table S1).

| Population structure and SNP phylogeography
The PCoA and ML analyses yielded three genetic clusters long branch in the NJ tree. The optimal K for the SNP data inferred from FASTStructure ranged from 1 to 18 (Figure 4b), and the lowest cross-validation error (CV) was obtained for K = 3. This clustering condition, supported by amova, revealed that populations were genetically more differentiated across the three clusters (13.38%, FCT = 0.134; p < .0001) than between populations (4.32%, FCT = 0.050; p < .0001) (Figure 4; in all other sites, particularly populations in Japan ( Figure S4).

| Isolation by distance analysis
The traditional Mantel test identified significant geographic differentiation among 18 populations in both marker datasets (SNPs:  Figure S6).

| Historical gene flow estimates
Given the differences in results of genetic structure analyses based on SNPs versus microsatellites, estimates of gene flow among population clusters identified by both analyses were made. In order to clarify the relationship between populations in China and Vietnam, the VN population was also separated for gene flow estimates.
Effective cluster size calculated with population size parameter θ, ranged from 563 to 2053. Frequent exchange of migrants per generation was inferred to occur among clusters ( Figure 5). High levels of bidirectional migration were inferred between Clusters 3-1 and

When SX population or CT/CY/ZY populations were included in
Cluster 3-1 based on the results by SNP dataset, obvious asymmetric migration was also detected ( Figure S7).

| DISCUSS ION
Accurate assessment of genetic structure is a major aim of pest molecular ecology, as it can reveal patterns of association among

| Hierarchical pattern of M. onukii genetic structure based on microsatellites versus SNPs
As in the prior analysis based on microsatellites, analysis of SNPs showed that populations in the Yunnan tea production area of China are genetically differentiated from the other populations (Zhang et al., 2016(Zhang et al., , 2019. Nevertheless, the present analysis based on SNPs revealed an additional genetic cluster consisting only of the two Japanese populations (Cluster 3-3 in Figure 4a) which was not recognized by Zhang et al. (2016Zhang et al. ( , 2019, and also uncovered finerscale geographic differentiation. The findings are consistent with other studies on various species, such as Laodelphax striatellus, Plasmodium vivax, Carassius carassius, and so on (Fola et al., 2020;Jeffries et al., 2016;Liu et al., 2019), where SNPs outperformed microsatellite markers in accuracy and clarifying finer population structure.
Our results suggest that microsatellites and SNPs may capture somewhat different aspects of population structure in M. onukii although the reasons for these differences remain to be explored.
Microsatellite markers tend to exhibit higher levels of homoplasy, null alleles, and usually reveal a large number of alleles at a locus, resulting in underestimated genetic differentiation between subpopulations with high-genetic distance (e.g., Chinese populations and Japanese populations) and identify population structure between populations with similar levels of genetic diversity, which is consistent to many prior studies (Campos et al., 2017;Fola et al., 2020).
In contrast, SNPs have a fixed number of allelic states (Elshire et al., 2011), and SNPs with slower mutation rates may also facilitate inferring higher level of genetic differentiation and provide additional discrimination among genetic clusters (Coates et al., 2009;Morin et al., 2004). This may explain why the two Japanese populations formed their own separate cluster in the analysis based on Our results imply that different mutation rates of the two types of markers might reflect different stages of population evolutionary history, which can lead to differences in estimated population structure. Microsatellite markers, with higher mutation rates, tend to provide insights mostly on recent divergences by compared to SNPs

| Correlations between morphological and geographic distance and genetic variation
Although we detected no significant genetic substructure in most Chinese tea areas, our analyses of two types of markers revealed special genetic structure and high-genetic differentiation in M.
onukii populations from the Southwest tea areas, especially among the populations from Yunnan tea plantations that also exhibit morphological variation in the aedeagus. Such correlation between genetic and morphological variation contrasts with a previous study of Asymmetrasca decedens, in which no genetic differentiation was found between individuals of two morphotypes based on differences in the shape of the aedeagus, (Loukas & Drosopoulos, 1992).  (Kaundun et al., 2000;Wachira et al., 2001;Yao et al., 2008).
In contrast, high-genetic diversity and less genetic variation were revealed in the Vietnamese population as expected, given the ab-

| Genetic connectivity and population dynamics
Analyses of population structure and phylogeography based on the SNPs reveal significant genetic differentiation between M.
onukii populations in Yunnan (Cluster 3-2), Japan (Cluster 3-3) and the other tea plantations of China and Vietnam. Significant asymmetrical historical migration from Cluster 3-1 and Cluster 3-2 to Cluster 3-3 is supported by non-overlapping 95% confidence intervals ( Figure 5; Figure S7). Previous studies based on mitochondrial markers indicated that few private haplotypes are present in the Yunnan populations (Zhou et al., 2014;Zhu et al., 2019), and the putative primitive haplotype was also present in populations from Yunnan and neighboring Sichuan, Guizhou, and Chongqing Provinces (Chen et al., 2015;Li et al., 2013). According to genetic diversity, genetic structure characteristics, historical gene flow of M. onukii populations revealed by this study and historical expansion of tea cultivation (Takeo et al., 1992), we hypothesize that the tea green leafhopper in China spread from Yunnan or the southwest populations to the other tea areas in Jiangnan, South China and, ultimately, to Japan with domestication and cultivation (the historical sources), and then returned back to the origin by recent events (such as fresh tea processing and tea germplasm exchange) forming a new diffusion source pattern. As a result, the leafhopper was most likely introduced to Japan tea plantations through these two diffusion sources from China ( Figure 5; Figure S7). We found no evidence of frequent, long-distance directional dispersal, as has been observed in some other species of Empoascini (e.g., the potato leafhopper, Empoasca fabae) in M. onukii. Instead, our results suggest that natural or human-assisted dispersal occurs among major tea growing areas of China but not frequently enough to prevent maintenance of population-level differences between major tea areas and high levels of genetic variability within some populations.
Additional genetic substructure in M. onukii populations throughout East Asia was further elucidated in this study by SNPs, and high levels of genetic differentiation were observed between populations in different clusters. But our analyses failed to reveal fine-scale dispersal routes between populations. Incorporation of additional SNPs into future studies by more efficient genotyping with the M.

CO N FLI C T O F I NTE R E S T
The authors declare that there are no conflicts of interest.

B EN EFIT-S H A R I N G S TATEM ENT
Benefits from this research accrue from the sharing of our data and results on public databases as described earlier.